Can Entropic Regularization Be Replaced by Squared Euclidean Distance Plus Additional Linear Constraints

نویسنده

  • Manfred K. Warmuth
چکیده

There are two main families of on-line algorithms depending on whether a relative entropy or a squared Euclidean distance is used as a regularizer. The difference between the two families can be dramatic. The question is whether one can always achieve comparable performance by replacing the relative entropy regularization by the squared Euclidean distance plus additional linear constraints. We formulate a simple open problem along these lines for the case of learning disjunctions. Assume the target concept is a k literal disjunction over n variables. The instances are bit vectors x ∈ {0, 1} and the disjunction Vi1 ∨ Vi2 ∨ . . . Vik is true on instance x iff at least one bit in the positions i1, i2, . . . , ik is one. We can represent the above disjunction as a weight vector w: all relevant weights wij are set to some threshold θ > 0 and the remaining n − k irrelevant weights are zero. Now the disjunction is a linear threshold function: the disjunction is true on x iff w · x ≥ θ. The following type of on-line algorithm makes at most O(k logn) mistakes on sequences of examples (x1, y1), (x2, y2), . . ., when the labels yt are consistent1 with a k-literal monotone disjunction: The algorithm predicts true on instance xt iff wt ·xt ≥ θ. The weight vector wt for predicting at trial t is determined by minimizing the relative entropy to the initial weight vector w1 subject to some linear constraints implied by the examples. Here the relative entropy is defined as Δ(w,w1) = ∑ iwi ln wi w1,i + w1,i − wi. More precisely, wt := minw Δ(w,w1) subject to the following example constraints (where θ, α > 0 are fixed): – w · xq = 0, for all 1 ≤ q < t and yt = false, – w · xq ≥ αθ, for all 1 ≤ q < t and yt = true. This algorithm is a variant of the Winnow algorithm [Lit88] which, for w1 = (1, . . . , 1), α = e and θ = ne , makes at most e+ ke lnn mistakes on any sequence of examples that is consistent with a k out of n literal disjunction.2 The crucial fact is that the mistake bound of Winnow and its variants grows logarithmically in the number of variables, whereas the mistake bound of the Perceptron algorithm is Ω(kn) [KWA97]. The question is, what is responsible for this dramatic difference? 1 For the sake of simplicity we only consider the noise-free case. 2 An elegant proof of this bound was first given in [LW04] for the case when the additional constraint i wi = 1 is enforced: for w1 = ( 1 n , . . . , 1 n ), α = e and θ = 1 ek , this algorithm makes at most ek lnn mistakes. G. Lugosi and H.U. Simon (Eds.): COLT 2006, LNAI 4005, pp. 653–654, 2006. c © Springer-Verlag Berlin Heidelberg 2006

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The limits of squared Euclidean distance regularization

Some of the simplest loss functions considered in Machine Learning are the square loss, the logistic loss and the hinge loss. The most common family of algorithms, including Gradient Descent (GD) with and without Weight Decay, always predict with a linear combination of the past instances. We give a random construction for sets of examples where the target linear weight vector is trivial to lea...

متن کامل

Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms

In this paper we discus a wide class of loss (cost) functions for non-negative matrix factorization (NMF) and derive several novel algorithms with improved efficiency and robustness to noise and outliers. We review several approaches which allow us to obtain generalized forms of multiplicative NMF algorithms and unify some existing algorithms. We give also the flexible and relaxed form of the N...

متن کامل

Linear plus fractional multiobjective programming problem with homogeneous constraints using fuzzy approach

  We develop an algorithm for the solution of multiobjective linear plus fractional programming problem (MOL+FPP) when some of the constraints are homogeneous in nature. Using homogeneous constraints, first we construct a transformation matrix T which transforms the given problem into another MOL+FPP with fewer constraints. Then, a relationship between these two problems, ensuring that the solu...

متن کامل

Smooth and Sparse Optimal Transport

Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, the entropy term keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack ...

متن کامل

Generalized Linear Model Regression under Distance-to-set Penalties

Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions, but often lead to unwanted shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006